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ABSTRACT 

Differential privacy is a promising privacy-preserving paradigm for 
statistical query processing over sensitive data. It works by inject- 
ing random noise into each query result, such that it is provably 
hard for the adversary to infer the presence or absence of any indi- 
vidual record from the published noisy results. The main objective 
in differentially private query processing is to maximize the accu- 
racy of the query results, while satisfying the privacy guarantees. 
Previous work, notably the matrix mechanism [16], has suggested 
that processing a batch of correlated queries as a whole can po- 
tentially achieve considerable accuracy gains, compared to answer- 
ing them individually. However, as we point out in this paper, the 
matrix mechanism is mainly of theoretical interest; in particular, 
several inherent problems in its design limit its accuracy in prac- 
tice, which almost never exceeds that of naive methods. In fact, 
we are not aware of any existing solution that can effectively op- 
timize a query batch under differential privacy. Motivated by this, 
we propose the Low-Rank Mechanism (LRM), the first practical 
differentially private technique for answering batch queries with 
high accuracy, based on a low rank approximation of the workload 
matrix. We prove that the accuracy provided by LRM is close to 
the theoretical lower bound for any mechanism to answer a batch of 
queries under differential privacy. Extensive experiments using real 
data demonstrate that LRM consistently outperforms state-of-the- 
art query processing solutions under differential privacy, by large 
margins. 

1. INTRODUCTION 

Differential privacy [1 1] is an emerging paradigm for publishing 
statistical information over sensitive data, with strong and rigorous 
guarantees on individuals' privacy. Since its proposal, differential 
privacy has attracted extensive research efforts, such as cryptogra- 
phy [11], algorithms [12, 14, 21], databases [8, 15, 16, 24, 27, 28, 
29], data mining [1, 13] and machine learning [3, 4, 25]. The main 
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idea of differential privacy is to inject random noise into aggre- 
gate query results, such that the adversary cannot infer, with high 
confidence, the presence or absence of any given record r in the 
dataset, even if the adversary knows all other records in the dataset 
except for r. This paper follows a popular definition of differen- 
tial privacy, called e-differential privacy, in which the adversary's 
maximum confidence in inferring private information is controlled 
by a user-specified parameter e called the privacy budget. Given e, 
the main goal of query processing under e-differential privacy is to 
maximize the utility/accuracy of the (noisy) query answers, while 
satisfying the above privacy requirements. 

This work focuses on a common class of queries called linear 
counting queries, which is the basic operation in many statistical 
analyses. Similar ideas apply to other types of linear queries, e.g., 
linear sums. Figure 1(a) illustrates an example electronic medical 
record database, where each record corresponds to an individual. 
Figure 1(b) shows the exact number of HIV+ patients in each state, 
which we refer to as unit counts. A linear counting query in this 
example can be any linear combination of the unit counts. For in- 
stance, let xny, xnj, xca, xwa be the patient counts in states 
NY, NJ, CA, and WA respectively; one possible linear counting 
query is xny + xmj + xca + xwa, which computes the total 
number of HIV+ patients in the four states listed in our example. 
Another example linear counting query is xny/W + xnj/8 + 
xca I '37, which calculates the weighted average of patient counts 
in states NY, NJ and CA, with weights set according to their re- 
spective population sizes. In general, we are given a database with 
n unit counts, and a batch QS of m linear counting queries. The 
goal is to answer all queries in QS under e-differential privacy, and 
maximize the expected overall accuracy of the queries. 
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State 


# of HIV+ patients 


NY 


82,700 


NJ 


19,000 


CA 


67,000 


WA 


5,900 







(a) Patient records (b) Statistics on HIV+ patients 

Figure 1: Example medical record database 

Straightforward approaches to answering a batch of linear count- 
ing queries usually lead to sub-optimal result accuracy. One naive 
solution, referred to as noise on queries (NOQ), is to process each 
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query independently, e.g., using the Laplace Mechanism [11]. This 
method fails to exploit the correlations between different queries. 
Consider a batch of three different queries gi = xny + xnj + 
xca + x W a, <?2 = x NY + x NJ , q 3 = x C a + x W a- Clearly, the 
three queries are correlated since q\ — qi + qz . Thus, an alternative 
strategy for answering these queries is to process only qi and qz, 
and use their sum to answer q\. As will be explained in Section 3, 
the amount of noise added to query results depends upon the sen- 
sitivity of the query set, which is defined as the maximum possible 
total change in query results caused by adding or removing a sin- 
gle record in the original database. In our example, the sensitivity 
of the query set {q2,qz} is 1, because adding/removing a patient 
record in Figure la affects at most one of q-z and qz (i.e., qi if the 
record is associated with state NY or NJ, and qz if the state is CA or 
WA), by exactly 1. On the other hand, the query set {q\ , q2, qz} has 
a sensitivity of 2, since a record in the above 4 states affects both 
qi and one of <j2 and qz- According to the Laplace mechanism, 
the variance of the added noise to each query is 2A 2 /e 2 , where A 
is the sensitivity of the query set, and e is the user-specified pri- 
vacy budget. Therefore, processing {qi,q2,qz} directly incurs a 
noise variance of 8/e 2 for each query; on the other hand, executing 
{<72, qz} leads to noise variance of 2/e for each of q^ and qz, and 
their sum q± = q^ + qz has a noise variance of 2 x 2/e 2 = 4/e 2 . 
Clearly, the latter method obtains higher accuracy for all queries. 

Another simple solution, referred to as noise on data (NOD), is 
to process each unit count under differential privacy, and combine 
them to answer the given linear counting queries. Continuing the 
example, this method computes the noisy counts for xny, xnj, 
xca and xwa, and uses their linear combinations to answer qi, q^, 
and qz . This approach overlooks the correlations between different 
unit counts. In our example, x ny and x nj (and similarly, xca and 
xwa) are either both present or both absent in every query, and, 
thus, can be seen as a single entity. Processing them as indepen- 
dent queries incurs unnecessary accuracy costs when re-combining 
them. In the example, NOD adds noise with variance 2/e 2 to each 
unit count, and their combinations to answer qi, <j2, and qz have 
noise variance 8/e 2 , 4/e 2 and 4/e 2 respectively. NOD's result util- 
ity is also worse than the above-mentioned strategy of processing 
q2 and qz, and adding their results to answer q\. 

In general, the query set QS may exhibit complex correlations 
among different queries and among different unit counts. As a 
consequence, it is non-trivial to obtain the best strategy to answer 
QS under differential privacy. For instance, consider the following 
query set: 

qi = 2x N j + x C A + xwa 

q2 = xnj + 2xwa 

qz = xny + 2xca + 2xwa 

NOQ is clearly a poor choice, since it incurs a sensitivity of 5 
(e.g., a record of state WA affects q\ by 1, and 92 and qz by 2 
each). The sensitivity of NOD remains 1, and it answers q\, q2, 
and qz with noise variance 12/e 2 , 10/e 2 and 18/e 2 respectively, 
leading to a sum-square error (SSE) of 40/e 2 . The optimal strategy 
in terms of SSE in this case computes the noisy results of xnj and 
xwa, as well as q[ = xny/3 + xca, and q' 2 = 2xny /3. Then, 
it obtains the results for qi, q 2 , and qz as follows. 

qi = q'i+ 2x N j + x W a - q'2/2 

ij2 = xnj + 2xwa 

qz = 2q[ + 2x W A + q'2/2 

The sensitivity of the above method is also 1, and it answers 
qi, q2, and qz with noise variance 12.5/e 2 , 10/e 2 and 16.5/e 2 re- 



spectively, resulting an SSE of 39/e 2 . Observe that the there is no 
simple pattern in the query set or the optimal strategy. Since there 
is an infinite space of possible strategies, searching for the best one 
is a challenging problem. 

Li et al. [16] investigate the problem of identifying a good strat- 
egy for answering these kinds of query sets under differential pri- 
vacy, and they propose a solution called the matrix mechanism. 
This solution, however, is mainly of theoretical interests due to 
the two reasons. First, it incurs an enormous computational over- 
head that limits its applicability to very small data and query sets. 
Second and more importantly, the matrix mechanism often gener- 
ates sub-optimal strategies for answering queries; in particular, its 
practical performance (in terms of query accuracy) almost never 
exceeds that of the naive solution NOD. Motivated by this, we pro- 
pose a novel solution called the low-rank mechanism (LRM), based 
on the theory of low-rank matrix approximation. We prove that 
the accuracy provided by LRM is within a constant factor of the 
theoretical lower bound established in [14]. Extensive experiments 
demonstrate that LRM significantly outperforms existing solutions 
in terms of result accuracy, sometimes by orders of magnitude. 

The rest of the paper is organized as follows. Section 2 reviews 
previous studies on differential privacy. Section 3 provides formal 
definitions for our problem. Section 4 presents the mechanism for- 
mulation of LRM, and analyzes its optimality. Section 5 discusses 
how to solve the optimization problem in LRM. Section 6 verifies 
the superiority of our proposal through an extensive experimental 
study. Finally, Section 7 concludes the paper. 

2. RELATED WORK 

Section 2. 1 surveys general purpose mechanisms for enforcing 
differential privacy. Section 2.2 presents our main competitor, the 
matrix mechanism [16]. 

2.1 Differential Privacy Mechanisms 

Differential privacy was first formally presented in [11], though 
some previous studies have informally used similar models, e.g., 
[9]. The Laplace mechanism [11] is the first generic mechanism 
for enforcing differential privacy, which works when the output do- 
main is a multi-dimensional Euclidean space. McSherry and Tal- 
war [21] propose the exponential mechanism, which applies to any 
problem with a measurable output space. The generality of the ex- 
ponential mechanism makes it an important tool in the design of 
many other differentially private algorithms, e.g., [6, 29, 21]. 

Linear query processing is of particular interest in both the the- 
ory and database communities, due to its wide range of applica- 
tions. To minimize the error of linear queries under differential 
privacy requirements, several methods try to build a synopsis of the 
original database, such as Fourier transformations [24], wavelets 
[28] and hierarchical trees [15]. By publishing a noisy synopsis 
under e-differential privacy, these methods are capable of answer- 
ing an arbitrary number of linear queries. However, most of these 
methods obtain good accuracy only when the query selection crite- 
rion is a continuous range; meanwhile, since these methods are not 
workload-aware, their performance for a specific workload tends to 
be sub-optimal. 

The compressive mechanism [17] reduces the amount of noise 
necessary to satisfy differential privacy, by utilizing the sparsity of 
the dataset under certain transformations. The main idea is to use 
a technique called compressive sensing to compress a sparse repre- 
sentation of the data into a compact synopsis, and inject noise into 
the much smaller synopsis instead of the original data. After that, 
the method reconstructs the original data by applying the decod- 
ing algorithm of compressive sensing to the noisy synopsis. The 
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result provides significantly higher utility, while satisfying differ- 
ential privacy requirements. 

Several theoretical studies have derived lower bounds for the 
noise level for processing linear queries under differential privacy. 
Notably, Dinur and Nissim [9] prove that any perturbation mech- 
anism with maximal noise of scale o(n) cannot possibly preserve 
personal privacy, if the adversary is allowed to ask all possible lin- 
ear queries, and has exponential computation capacity. By reducing 
the computation capacity of the adversary to polynomial-bounded 
Turing machines, they show that an error scale Q.{^/n) is necessary 
to protect any individual' privacy. 

More recently, Hardt and Talwar [14] have significantly tight- 
ened the error lower bound for answering a batch of linear queries 
under differential privacy. Given a batch of m linear queries, they 
prove that any e-differential privacy mechanism leads to squared 
error of at least Q,(e~ 2 m 3 Vol(W)), where Vol(W) is the volume 
of the convex body obtained by transforming the £i-unit ball into 
m-dimensional space using the linear transformations in the work- 
load W. They also propose a mechanism for differential privacy 
whose error level almost reaches this lower bound. However, their 
mechanism relies on uniform sampling in a high-dimensional con- 
vex body, which, although it theoretically takes polynomial time, 
is too expensive to be of practical use. This paper extends their 
analysis to low-rank workload matrices. 

Besides linear queries, differential privacy is also applicable to 
more complex queries in various research areas, due to its strong 
privacy guarantee. In the field of data mining, Friedman and Schus- 
ter [13] propose the first algorithm for building a decision tree un- 
der differential privacy. Mohammed et al. [22] study the same 
problem, and propose an improved solution based on a general- 
ization strategy coupled with the exponential mechanism. Ding et 
al. [8] investigate the problem of differentially private data cube 
publication. They present a randomized materialized view selec- 
tion algorithm, which reduces the overall error, and preserves data 
consistency. 

In the database literature, a plethora of methods have been pro- 
posed to optimize the accuracy of differentially private query pro- 
cessing. Cormode et al. [6] investigate the problem of multi- 
dimensional indexing under differential privacy, with the novel idea 
of assigning different amounts of privacy budget to different levels 
of the index. Xu et al. [29] optimize the procedure of building a 
differentially private histogram, with an interesting combination of 
a dynamic programming algorithm for optimal histogram compu- 
tation and the exponential mechanism. 

Differential privacy is also becoming a hot topic in the machine 
learning community, especially for learning tasks involving sen- 
sitive information, e.g., medical records. In [4], Chaudhuri et al. 
propose a generic differentially private learning algorithm, which 
requires strong convexity of the objective function. Rubinstein et 
al. [25] study the problem of SVM learning on sensitive data, and 
propose an algorithm to perturb the kernel matrix with performance 
guarantees, when the loss function satisfies the i-Lipschitz continu- 
ity property. General differential privacy techniques have also been 
applied to real systems, such as network trace analysis [19] and 
private recommender systems [20]. 

2.2 Matrix Mechanism 

Li et al. [16] propose the matrix mechanism, which targets the 
same problem as this work, i.e., answering a batch of linear queries 
under differential privacy. Given a workload of linear queries, the 
matrix mechanism first constructs a workload matrix W of size 
mxn, where m is the number of queries, and n is the number of 
unit counts. The construction of the workload matrix is elaborated 



further in Section 3. After that, the mechanism searches for a strat- 
egy matrix A of size rxn, where r is a positive integer. Intuitively, 
A corresponds to another set of linear queries, such that every query 
in W can be expressed as a linear combination of the queries in A. 
The matrix mechanism then answers the queries in A under differ- 
ential privacy, and subsequently uses their noisy results to answer 
queries in W. 

The main challenge faced by the matrix mechanism is to identify 
the strategy matrix A that answers W with the highest accuracy. 
The solution in [16] is limited to the case where (i) A has a pseudo- 
inverse A 1 "; and (ii) A is optimized based on the £2 approximation 
of the objective function. However, one necessary condition for 
a matrix A to have a pseudo-inverse is that there must be at least 
as many rows as columns, i.e., r>n. This requirement seriously 
limits the search space for A. For instance, imagine an application, 
akin to that shown in Figure 1 , where there are 50 unit counts, each 
corresponding to a state in the US. Then, the strategy matrix must 
have at least 50 queries, regardless of how many queries there are 
in the original workload W. None of the strategies used in the 
example of Figure 1 can be identified by the matrix mechanism, 
simply because they do not contain enough queries. Furthermore, 
the optimal answer computed using the modified objective function 
(i.e., its £2 approximation) does not necessarily lead to low error 
according to the original objective function. In fact, throughout 
our experimental evaluations, we have never found a single setting 
where the matrix mechanism obtains lower overall error than the 
naive solution of injecting noise directly into the unit counts. In 
addition, the matrix mechanism also incurs a high computational 
overhead. Overall, the matrix mechanism is mainly of theoretical 
interest. 

3. PRELIMINARIES 

In this paper, we assume there are n records in a database D, i.e., 
D — {xi, X2, ■ ■ ■ ,x n }. Each Xi inDis a real number. To facilitate 
matrix manipulations, in the rest of the paper we use a vector of size 
n x 1 to denote the database, i.e. {xi, X2, ■ ■ ■ , x„} T . In Figure 1, 
for example, each record contains the number of HIV+ patients in 
a state of the USA. A query set Q of cardinality m is a mapping 
from the database domain to real numbers, i.e., Q : D 1— > K m . 

3.1 Differential Privacy 

A query processing mechanism M is a randomized mapping 
from D x Q to E m . Given an arbitrary query set Q £ Q and a 
database D G D, the mechanism M returns a distribution on the 
query output domain E m . Two databases Di and D2 are neigh- 
bor databases iff they differ on exactly one record, i.e., D\ — 
{21,2:2, ...,Xi,... ,x n } and D 2 = {xi,x 2 , ... ,x[,.. ., x n }. A 
randomized mechanism M satisfies e-differential privacy if for ev- 
ery pair of neighbor databases Di and D2, we have 

VQVi?: Pr(M(Q,Di) = R) < e e Pr(M (Q, D 2 ) = R) (1) 

The above inequality implies that the mechanism M always re- 
turns similar results on neighbor databases. This limits the adver- 
sary's confidence in inferring any record from the output of M, 
even when he or she knows all remaining records in the database. 

In [1 1], Dwork et al. presented a general protocol to implement 
e-differential privacy, utilizing the concept of sensitivity. Given a 
query set Q £ Q, the sensitivity A is the maximal £\ distance 
between the exact query results on any neighbor databases D\ and 
D 2 , i.e. 

A= max ||Q(L>i),Q(D 2 )||i (2) 
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We emphasize that A only depends on the data domain D and 
the query set Q, not the actual data. Therefore, we simply assume 
such a constant A is public knowledge to everyone, including the 
adversary. The Laplace Mechanism [11], Ml, outputs a random- 
ized result R on database D, following a Laplace distribution with 
mean Q(D) and magnitude A ; i.e., 

Pr(M L (Q,D) = R) oc exp ^\\R - Q(£>)||i) (3) 

This is equivalent to adding m-dimensional independent Laplace 
noise, as Q(D) + Lap[^-) m , in which Lap(-) is a random 
variable following a zero-mean Laplace distribution with scale A . 
Based on the definition of the Laplace mechanism, the expected 
squared error of the randomized query answer is 2m J^ , since the 
variance of Lap(s) is 2s 2 for any scale s. Note that the amount of 
error only depends on the sensitivity of the queries, regardless of 
the records in database D. 

3.2 Batch Linear Queries 

As mentioned in the introduction, we focus on non-interactive 
linear queries in this paper. A linear query q(D) is in the form of 
a linear function over the records in the database. Given a weight 
vector {w\,W2, . . . ,w n } T of size n, the linear query returns the 
dot product between the weight vector and database vector, i.e., 

q(D) = W!X! + w 2 x 2 + . . . + w n x n 

We assume a batch of m linear queries, Q = {qi, 92, ■ ■ • , q m }, 
is submitted to the database at the same time. The query set Q 
is thus represented by a workload matrix W with m rows and n 
columns. Each entry Wij in W is the j-th coefficient for query 
qi on record Xj. Using the vector representation of the database, 
i.e. D — (xi,X2, ■ ■ ■ ,x n ) T , the query batch Q can be exactly 
answered by calculating: 

Q(D) =WD=(^2 WijXj U",,r.j 

Based on the Laplace mechanism, two baseline solutions to en- 
force e-differential privacy on a query batch with workload W are 
as follows. 

Noise on data: This solution, denoted as Md, adds noise to the 
original data. Given database D, Md generates a noisy database 
D' using the Laplace mechanism, i.e., D' — D + Lap (— )™. The 
query batch Q is then answered by replacing D with D'. The whole 
mechanism can be written in the form of manipulation on random 
variables, as follows. 

M D (Q,D) = WD' = W (o + Lap f^) j (4) 

Based on the linearity of expectation, it is straightforward to cal- 
culate the expected squared error on the output, J2i j 
which is proportional to the squared sum of the entries in W. 
Noise on results: This baseline solution, denoted as Mr, adds 
noise to the query results instead of the original data. Since the 
queries are linear queries, the sensitivity of the query set is A' = 
maxj ^2- \Wij I A, i.e., the highest column absolute sum[16]. Thus, 
M R outputs the following random results. 

M R (Q, D) = WD + Lap (~~\ ^ 

Similarly, the expected squared error of the mechanism on query 
Q is 2mA' 2 r 2 = 2mmax J - ]T\ W 2 A 2 e~ 2 . By comparing their 
expected squared errors, we derive that Mr outperforms Md by 



expectation, iff mmax 3 Wfj < Y2j ]Ci W§- When m > n, 
this inequality can never hold, implying that Mr is more effective 
only when m is smaller than n. 

3.3 Low Rank Matrices 

For any square matrix A = {Aij} of size n x n, the trace of 
the matrix is the sum of the diagonal entries in A, i.e., tx(A) = 
An. Given a matrix W = {Wij} of size m x n, the Frobenius 
norm of W is the square root of the squared sum over all entries, 

i.e., ||W||f = \jY^ij(Wij) 2 - Following common notation, W T 
denotes the transposed matrix of W. 

Singular value decomposition (SVD) applies to any real-valued 
matrix W. Specifically, the result of SVD on W includes three 
matrices, U, S and V, such that W = UT,V. Here, U, E, and 

V are of size m x s, s x s, and s x n respectively, where m and 
n are the number of rows and columns in W respectively, and s 
is a positive integer no larger than min{ra, n}. Moreover, U and 

V are row-wise and column-wise orthogonal matrices respectively. 
S is a diagonal matrix, which contains non-negative real numbers 
on the diagonal and zeros in all the other entries. These diagonal 
entries, {Ai, A2, . . . , A s }, are called eigenvalues of the matrix W. 
The number of non-negative eigenvalues is called the rank of W, 
denoted as rank(W). 

When the rows and columns in the matrix W are correlated, the 
rank of the matrix W can be smaller than m and n. In such cases, 
we say that W is a low rank matrix. For example, when a group 
of records tend to appear together in a query, the workload matrix 
W often exhibits strong column correlations. Similarly, when one 
query can be expressed as the linear combination of other queries, 
W has strong row correlations. Both cases can be exploited to re- 
duce the noise level necessary to satisfy differential privacy, as we 
showed in Section 1. Next we present the Low Rank Mechanism, 
a general solution to enforce differential privacy on a batch of lin- 
ear queries, which utilizes the low rank property of the workload 
matrix to reduce noise. 

4. WORKLOAD DECOMPOSITION 

In this section, we propose a general workload matrix decom- 
position technique that minimizes the error for a batch of linear 
queries. Recall that the example in Figure 1 shows that instead 
of adding noise to the original data or query results (i.e., methods 
NOD and NOR), it is sometimes possible to construct another lin- 
ear basis that leads to higher overall query accuracy. To build such 
a basis, we partition the workload matrix W into the product of 
two components, B = {Bij} of size m x r and L = {Ljt} of 
size r x n, such that W = BL. Note that r can be larger than the 
rank of the workload matrix W. Given the matrix decomposition, 
we design general mechanism for adding noise to LD (D is the 
dataset), and analyze the expected squared error. We first formally 
define the concepts of query scale and query sensitivity, for a given 
decomposition W = BL. 

Definition 1. Query Scale 

Given a workload decomposition W — BL, the scale of the de- 
composition, denoted by $(-B, L), is the squared sum of the entries 
inB,i.e.,<S>(B,L) = Z i , j Bj j . 

DEFINITION 2. Query Sensitivity 

Given a workload decomposition W = BL, the sensitivity of the 
decomposition, denoted by A(B, L), is the maximal absolute sum 
of any column in L, i.e., A(B, L) = maxj \Lij\. 

Since W = BL, the linear query batch can be answered by 
calculating Q(D) = WD = BLD. Unlike solutions NOD and 
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NOR, we inject noise into the intermediate result LD to enforce 
differential privacy. Since LD is another group of linear queries, 
we can apply NOR on Q'(D) = LD with Eq. (5). The sensitivity 
of the new linear query batch is A(B, L), which leads to the fol- 
lowing differential privacy mechanism Mp(Q, D) with respect to 
the workload decomposition W = BL. 

M P (Q, D)=B (LD + Lap f A( ^' L) \ \ (6) 

The error analysis of Mp (Q, D) is complicated as its adds noise 
at an intermediate step. The following lemma shows that the error 
is linear in the query scale, and quadratic in the query sensitivity. 

LEMMA 1. The expected squared error of Mp{Q, D) with re- 
spect to the decomposition W = BL is 2$(B, L) (A(B, L)) 2 /e 2 . 

Accordingly, we reduce the problem to finding the optimal work- 
load decomposition W = BL that minimizes $(5, L) (A(B,L)) 2 . 
However, this optimization problem is difficult to solve, since the 
objective function is the product of &(B,L) and A(B,L), and 
A(B, L) may not be derivable. To address this problem, we first 
prove an interesting property of the workload decomposition, which 
implies that the exact query sensitivity is actually not important. 

LEMMA 2. Given a workload decomposition W = BL and a 
positive constant a, we can always construct another decomposi- 
tion W — B' L' such that B' = aB and L' = oT 1 L, satisfying 

<S>(B, L) (A(B, L)) 2 = ti) (A(B', L')) 2 

According to the above lemma, the balance between scale and 
sensitivity is not important, as we can always build another equiv- 
alent workload decomposition with arbitrary sensitivity. This mo- 
tivates us to formulate a new optimization program, which focuses 
on minimizing the query scale while fixing the query sensitivity. 
The following theorem formalizes this claim. 

THEOREM 1. Given the workload W, W = BL is the opti- 
mal workload decomposition to minimize expected squared error if 
(B, L) is the optimal solution to the following program: 

Minimize: a(B T B) 
s.t. W — BL 

Vj£|L«|<1 

i 

In the optimization problem above, we are allowed to specify 
the number of columns in the matrix B, i.e. the rank r of the 
matrix product BL. This enables us to generate matrices of sig- 
nificantly lower rank than the strategy matrix proposed in [16]. We 
thus use Low Rank Mechanism to denote the general query process- 
ing scheme in Eq. (6), using the optimal decomposition solution to 
Formula (7). 

4.1 Optimality Analysis 

In this subsection, we analyze the optimality of our optimization 
formulation. Specifically, we show that the utility of our proposed 
mechanism almost reaches the known utility lower bound for linear 
queries under differential privacy [14]. 

LEMMA 3. Given a workload matrix W of rank r with eigen- 
values {Ai, . . . , A r }, the expected squared error of Mp(Q, D) 
w.r.t. the optimal decomposition W = B*L* in low rank mech- 
anism is bounded above by Ylk=i A fe r / e2 - 



Using the geometric analysis technique under orthogonal pro- 
jection [14], the following lemma reveals a lower bound on the 
squared error for linear queries. 

LEMMA 4. Given a workload matrix W of rank r with eigen- 
values {Ai, . . . , A r }, the expected squared error of any e- differential 
privacy mechanism is at least 

n((Jr>)"V) 

Assume that all the eigenvalues {Ai, A2, . . . , A r } of workload 
W are ordered in non-ascending order. We use C = Ai/A r to 
denote the ratio between the largest eigenvalue and the smallest 
non-zero eigenvalue. The following theorem discusses the tight- 
ness of low rank mechanism on error minimization. In particular, it 
proves the optimality of the result decomposition W = B*L* with 
respect to Formula (7). 

THEOREM 2. When r > 5, the mechanism M P (Q,D) using 
W = B* L* is an 0(C 2 r)- approximately optimal solution w.r.t. 
the set of all non-interactive e-differential privacy mechanisms. 

When C is close to 1, all non-zero eigenvalues are close to each 
other and the mechanism under our decomposition optimization 
program outputs results that well approximate the lower bound. 
This result answers one of the questions in [14], in which the au- 
thors discussed possible orthogonal projections but did not provide 
a concrete algorithm to identify the optimal projection. Our formu- 
lation can be regarded as an implementation of orthogonal projec- 
tion with almost constant approximation. Therefore, our result fills 
the gap between theory and practice. 

4.2 Relaxation on Decomposition 

Theorem 2 shows that our decomposition leads to results with 
a tight bound. However, when there are very small eigenvalues in 
the workload matrix W, the bound in the theorem becomes loose. 
On the other hand, these small eigenvalues contribute little to the 
workload matrix W. This observation motivates us to design a new 
optimization formulation, in which BL does not necessarily match 
W, but within a small error tolerance. This enables the formulation 
to find a more compact decomposition, such that the r used in B 
and L can be smaller than the actual rank of W. 

To do this, we introduce a new parameter 7 to bound the differ- 
ence between W and BL in terms of the Frobenius norm. This 
leads to a new optimization problem: 

Minimize: tr(B T B) 
s.t. \\W-BL\\ F < 1 (g) 

Vj£|L«|<1 

i 

After finding the optimal (B,L) for the problem in Formula 8, 
the mechanism Mp(Q,D) outputs query results using Eq. (6). 
The error of this new mechanism is also bounded, as stated in the 
following theorem. 

THEOREM 3. The expected squared error of Mp(Q,D) using 
the decomposition (B, L) satisfying Eq. (8) is at most 

2tr(B T B)/e 2 + 7 ][>? 
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Algorithm 1 Workload Matrix Decomposition 

1: Initialize tt (0) = G R mxn , /3 (0) = 1, k = 1 

2: while not converged do 

3: //Approximately solve the subproblem 

4: while not converged do 

5: B {k) <- update B using Eq. (9) 

6: L (fe) «- run Algo. 2 to update L w.r.t. Formula (10) 

7: Computer = || W-B w L (k) \\ F 

8: if r is sufficiently small or (3 is sufficiently large then 

9: return B {k) and L (fe) 

10: if fc is divisible by 10 then 

11: /3 (k+1) = 2/3 (k) 

12: 7r( fe+1 ' = tt<*> + P ik+1) (W - B« L«) 

13: k = fc + 1 



While Theorem 3 implies the possibility of estimating the op- 
timal 7, it is not practical to implement it directly, because this 
estimation depends on the data, i.e., ^ x\. In our experiments, 
we test different values of 7, and report their relative performance, 
regardless of the data distribution. 

5. DECOMPOSITION ALGORITHM 

The previous section formulates the workload matrix decompo- 
sition problem as an optimization program, which is rather compli- 
cated and non-trivial to solve. This section describes an effective 
and efficient solution for this program, based on the inexact Aug- 
mented Lagrangian Multiplier (ALM) method [5, 18]. 

The main challenge in solving the optimization program of For- 
mula (8) is the non-smooth L\ regularized term. The projected 
gradient method [10] is considered one of the most efficient gen- 
eral algorithms to solve these problems. Following the strategy 
used in [5], we treat the L\ regularized term separately and ap- 
proximately minimize a sequence of Lagrangian subproblems. Our 
inexact Augmented Lagrangian method for workload matrix de- 
composition problem is summarized in Algorithm 1. 

In order to handle the linear constraints || W — BL\\f < 7 — > 0, 
in which W G K mxn , B G K mxr and L G K rxn , the inexact 
Augmented Lagrangian method introduces a positive penalty item 
/3 G R and the Lagrange multiplier tt G E mxn . The update on (3 
and 7T follows the standard strategy used in [5, 18]. Given fixed f3 
and 7T in each iteration, the algorithm aims to find a pair of new B 
and L to minimize the following subproblem: 

J(B,L,p,n) = itr(B T ' B) + {n,W - BL) + - BL\\% 

s.t. Vj£|L«| < 1 

i 

This is a Bi-Convex optimization problem, which can be solved 
by block gradient descent via alternately optimizing B and L. Based 
on the formulation above, optimizing B is straightforward. Since 
the gradient with respect to B can be computed as: 

^=B- tvL t + f3BLL T - PWL T 
oB 

based on the fact that J(-) is convex with respect to B, we can set 
jjj; = 0, and obtain a closed form solution to update B: 

B = [pWL T + ttL t ) (PLL T + /) 1 (9) 

The second step is to optimize L, which is equivalent to solving 
the following quadratic programming problem: 



Algorithm 2 Nesterov's Projection Gradient Method 

1: input: C7(L),|£,L(°) 

2: x = r ' n ' 10~ 12 , Lipschitz parameter: a/ ' = 1 

3: Initializations: = L (0) ,S { - 1 '> = 0,8 m = l,t = 1 

4: while not converged do 

5: a=*J^,S = L«+a(L<*>-L<*- 1 >) 
6: for j — to ... do 

7: w = 2 3 cj ( *- 1) , [/ = S - iV s 

8: Project U to the feasible set to obtain (i.e. solve For- 
mula (11)) 
9: if ||5-L (t) || F < xthen 
10: return; 

11: Define function: J u ,s(U) = G(S) + (§§,t/ - 5) + 

j\\U-S\\% 
12: if G(L {t) ) <J„, S (U) then 
13: oj (t) =u; L (t+1) = L w ; break; 

14: Set^) = i±^±fS 
15: t = t + l 
16: return L (t) 



g(L) = |tr (l t B t BL) - tr UpW + nf BL) 

^ (10) 
s.t. Vj^|L«| < 1 

i 

In order to minimize Eq. (10) under constraints, we employ Nes- 
terov's first order optimal method [23] to accelerate the gradient 
decent. Nesterov's method has a much faster convergence rate than 
traditional methods such as the subgradient method or the naive 
projected gradient descent. In particular, the gradient of G(L) with 
respect to L is 

H = (3B T BL - PB T W - B T n 
oL 

L is updated by gradient descent while ensuring that the L\ reg- 
ularized constraint on L is satisfied. This can be done by solving 
the following optimization problem: 

mm\\L-L^\\ 2 F ,s.t. VjJ] |L«| < 1, (11) 

i 

in which Is-*' denotes the last feasible solution after exactly k itera- 
tions. Since Formula (11) can be decoupled into r independent L\ 
regularized sub-problems, it can be solved efficiently by L\ pro- 
jection methods [10]. The complete algorithm for the projection 
method is summarized in Algorithm 2. 

Convergence Analysis: In each iteration, the algorithm solves a 
sequence of Lagrangian subproblems by optimizing B (step 5) and 
L (step 6) alternatingly. The algorithm stops when a sufficiently 
small 7 is achieved or the penalty parameter /? is sufficiently large. 
It suffices to guarantee that L converges to the optimal solution 
[18]. Although the objective function is non-smooth, the algorithm 
possesses excellent convergence properties. To be precise, we for- 
mally establish the following convergence statement. 

THEOREM 4. If '(B (fe) ,L (fc) ) is the temporary solution after the 
k-th iteration and (B* , L*) is the optimal solution to Formula (7), 
we have 
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Figure 2: Effect of varying relaxation parameter 7 with the Search Logs dataset for LRM 
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Figure 3: Effect of varying r with Search Logs dataset for LRM 
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(12) 



Since f}^ doubles after every 10 iterations, the algorithm con- 
verges rapidly. This proves the fast convergence property of our 
algorithm. 

Complexity Analysis: The total number of variables in B and L 
is (m+n)r. Each update on B in Eq. (9) takes 0(r 2 m) time, while 
each update on L takes 0(r 2 n) time. If Algorithml converges to 
a local minimum with Ni n inner iterations (at line 4 in Algorithm 
1) and N ou t outer iterations (at line 2 in Algorithm 1), the total 
complexity of Algorithm 1 is 0(Ni„ x N out x (r 2 m + r 2 n)). 

6. EXPERIMENTS 

This section demonstrates the effectiveness of the proposed Low- 
Rank Mechanism (LRM), and compares it against four state-of- 
the-art methods: the Matrix Mechanism (MM) [16], the Laplace 
Mechanism (LM) [1 1], the Wavelet Mechanism (WM) [28] and the 
Hierarchical Mechanism (HM) [15]. We implemented the Matrix 
Mechanism (MM) by optimizing the £2 approximation instead of 
£1 error as suggested in [16]. The details of our MM implementa- 
tion are available in Appendix B. All methods were implemented 
and tested in Matlab on a desktop PC with Intel quad-core 2.50 
GHz CPU and 4GBytes RAM. In all experiments, every algorithm 
is executed 20 times and the average performance is reported. We 
employ three popular real datasets used in [15, 29]: Search Log, 
Net Trace and Social Network. Search Log includes search key- 
word statistics collected from Google Trends and American On- 
line between 2004 and 2010. Social Network gives the number of 
users in a social network site with specific degrees in the social 
graph. Net Trace is a statistical database containing the number of 
TCP packets related to particular IP addresses, which is collected 



from a university intranet. Search Logs, Net Trace and Social Net- 
work contain 2 16 = 65,536, 2 15 = 32,768 and 11,342 entries 
respectively. The reader is referred to [15] for more details of these 
datasets. We published our Matlab implementations of all algo- 
rithms used in the experiments, as well as sample datasets, online 
at http : / /yuanganzhao . weebly . com/. 

To evaluate the impact of data domain cardinality on real datasets, 
we transform the original counts into a vector of fixed size n (do- 
main size), by merging consecutive counts in order. Given the num- 
ber m of linear queries in the batch, we generate three different 
types of workloads, namely WDiscrete, WRange and WRelated. In 
WDiscrete, for each weight Wij of query q; in the batch, we ran- 
domly select Wij — 1 with probability 0.02 and set Wij = — 1 
otherwise. In WRange, a batch of range queries on the domain 
are generated, by randomly picking up the starting location a and 
ending location b following a uniform distribution on the domain. 
Given the interval (a, 6), we set Wij of query qi in the batch to 1 
for every a < j < b and all other weights to 0. Finally, for WRe- 
lated, we generate s (discussed later) independent base queries A 
of size s x n, by randomly assigning weights to the queries under 
a standard (0, l)-normal distribution. Another group of correlation 
matrix C of size mx s are generated similarly. The final workload 
W of size m x n is the product of C and A. 

We test the impact of five parameters in our experiments: 7, r, 
n, m and s. 7 is the relaxation factor defined in Formula (8). r is 
the number of columns in B (and also the number of rows in L). 
n is the size of the domain and m is the number of queries in the 
batch. Finally, s is the number of rows of queries in the base A, 
which is only used in the generation of WRelated. The range of all 
these five parameters is summarized in Table 1. Unless otherwise 
specified, the default parameters in bold are used. Moreover, we 
test three different privacy budgets, e = 1, 0.1 and 0.01. Note that 
the squared error incurred by all the methods is quadratic in 1/e. 
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Figure 4: Effect of varying domain size n on workload 'WDiscrete' with e = 0.1 
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Figure 5: Effect of domain size n on workload WRange with e = 0.1 



7 


0.0001,0.001,0.01,0.1, 1, 10 


r 


{0.8, 1.0, 1.2, 1.4, 1.7, 2.1, 2.5, 3.0, 3.6} x rcmfc(VK) 


n 


128, 256, 512, 1024, 2048, 4096, 8192 


m 


64,128,256,512,1024 


s 


{0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1.0} X minim, n) 



Table 1: Parameters used in the experiments 



In the experiments, we measure Average Squared Error and Com- 
putation Time of the methods. Specifically, the Average Squared 
Error is the average squared £2 distance between the exact query 
answers and the noisy answers. In the following, we first examine 
the impact of 7 and r, which are only used in the LRM method. 
The results provide important insights on how to tune these two 
parameters to maximize the utility of the LRM method. 

6.1 Impact of 7 and r on LRM 

In LRM, 7 is an important parameter controlling the relaxation 
on the approximation of BL to W. In our first set of experiments, 
we investigate the impact of 7 on the accuracy and the efficiency of 
LRM. Figure 2 reports the performance of LRM under all three dif- 
ferent workloads, WDiscrete, WRange and WRelated on the Search 
Log dataset with varying values for 7. The results in the figure show 
that the errors of LRM on all three workloads are not sensitive to 
7 in the range from 10~ 4 to 10. On the other hand, LRM executes 
much faster with larger 7. This suggests that a larger value for 7 
is preferred in practice, to achieve high efficiency without losing 
much on result accuracy. Moreover, we also test with three differ- 
ent values of the privacy budget e. Since the decomposition method 
does not rely on e, the shapes of the result curves with different e 
values are nearly identical, albeit at different scales. The average 
error is quadratic in the privacy budget as expected. 

In LRM, r is another important parameter that determines the 
rank of the matrix BL that approximates the workload W. r af- 
fects both the approximation accuracy and the optimization speed. 



When r is too small, e.g., when r < rank(W), our optimization 
formulation may fail to find a good approximation, leading to sub- 
optimal accuracy for the query batch. On the other hand, an overly 
large r leads to poor efficiency, as the search space expands dramat- 
ically. We thus test LRM with varying r, by controlling the ratio 
of r to the actual rank rank(W), on the Search Log dataset. We 
record the average squared error under all the workloads and report 
it in Figure 3. 

There are several important observations in Figure 3. First, when 
r < rank(W), the accuracy of LRM is far worse (up to two orders 
of magnitude) than that in other settings. Second, the performance 
of LRM is rather stable when r becomes larger than 1.2-rank(W). 
This is because the optimization formulation has enough freedom 
to find the optimal decomposition when r is larger than rank(W). 
Finally, the amount of computation spent on workload decomposi- 
tion increases exponentially with r. Thus, to balance the efficiency 
and effectiveness of LRM, a good value for r is between rank(W) 
and 1.2 ■ rank(W). We use the latter as the default value in the 
subsequent experiments. 

6.2 Impact of Varying Domain Size n 

We now evaluate the performance of all mechanisms with vary- 
ing domain size n. As mentioned earlier in this section, the domain 
size is controlled by merging consecutive counts in the original do- 
main. While different workloads and datasets are used, we only test 
with e = 0.1 because e does not have much impact on the relative 
performance of different mechanisms. In Figures 4, 5 and 6, we 
report the result errors of all these mechanisms. 

In all the experiments, the Matrix Mechanism (MM) is much 
worse than the other mechanisms, sometimes by an order of mag- 
nitude. This is mainly because 1) the strategy matrix in MM must 
be a full rank matrix; and 2) the £2 approximation used by MM 
does not lead to a good optimization of the actual objective func- 
tion formulated using the error measure in C\. Because of its poor 
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Figure 7: Effect of number of queries m 

performance, we exclude MM in the rest of the experiments. 

On the WDiscrete workload, the Laplace Mechanism (LM) out- 
performs all other mechanisms when the domain size is relatively 
small. This is in part due to the fact that the Wavelet Mechanism 
(WM) and the Hierarchical Mechanism (HM) are mainly designed 
to optimize range queries. While all other mechanisms incur linear 
error in terms of the domain size n, LRM's error stops increasing 
when the domain size is larger than 512. This is because LRM's 
error relies on the rank of the workload matrix W, and rank{W) 
is no larger than min(m, n) no matter how large n is. This ex- 
plains the excellent performance of LRM on larger domains. On 
the WRange workload, the errors of WM and HM are smaller than 
LM when the domain size is no smaller than 512, in which case 
their strategies work better. LRM's performance is still signifi- 
cantly better than any of them, since LRM fully utilizes the cor- 
relations between these range queries on large domains. Finally, 
on the WRelated workload, LRM achieves the best performance on 
all test cases. The performance gap between LRM and other meth- 
ods is over two orders of magnitude, when the domain size reaches 
8192. Since WRelated naturally leads to a low rank workload ma- 
trix W, this result verifies LRM's vast benefit from exploiting the 
low-rank property of the workload. 

6.3 Impact of Varying Query Size m 

In this subsection, we test the impact of the query set cardinality 
m on the performance of the mechanisms. We mainly focus on 
settings when the number of queries m is no larger than the domain 
size n, i.e. m < n. Due to space limitations, we only present the 
results on WRange and WRelated workloads in Figures 7 and 8. 

The results lead to several interesting observations. On WRange 
workload (Figure 7), LRM outperforms the other mechanisms, when 
the number of queries m is significantly smaller than n. With grow- 
ing m, the performance of all mechanisms on WRange tends to 
converge. When m = 1024, WM achieves the best performance 
among all mechanisms, since it is optimized for range queries. The 



on workload WRange with e = 0.1 

degeneration in performance of LRM is due to the lack of low rank 
property when the batch contains too many random range queries. 
On WRelated workload, LRM is dramatically better than the other 
methods, for any query set cardinality m. Regardless of the value 
of m, the rank of the WRelated workload W remains low, which is 
solely determined by the parameter s used in the workload genera- 
tion procedure. These results further confirm that the squared error 
generated by LRM scales linearly with the rank of the workload. 

6.4 Impact of Varying Query Rank s 

All previous experiments demonstrate LRM's substantial perfor- 
mance advantage when the workload matrix has low rank. In this 
group of experiments, we manually control the rank of workload 
W to verify the correctness of our claim. Recall that the param- 
eter s determines the size of the matrix C m xs and the size of the 
matrix A SX n in the generation of the WRelated workload. When 
C and A contain only independent rows/columns, s is exactly the 
rank of the workload matrix W = CA. In Figure 9, we vary s from 
0.1 min(m, n) to min(m, n). Compared to the other mechanisms, 
LRM maintains an accuracy advantage of over two orders of mag- 
nitude, when the rank of the workload matrix is low. With increas- 
ing rank of W, the accuracy of other mechanisms remain stable, 
while LRM's error grows rapidly. This phenomenon again con- 
firms that the low rank property is the main reason behind LRM's 
advantages with respect to error minimization. 



7. CONCLUSION 

This paper presented the Low Rank Mechanism (LRM), an opti- 
mization framework that minimizes the overall error in the results 
of a batch of linear queries under e-differential privacy. LRM is the 
first practical method for a large number of linear queries, with an 
efficient and effective implementation using well established op- 
timization techniques. Experiments show that LRM significantly 
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outperforms other state-of-the-art differentially private query pro- 
cessing mechanisms, often by orders of magnitude. The current 
design of LRM focuses on exploiting the correlations between dif- 
ferent queries. One interesting direction for future work is to fur- 
ther optimize LRM by utilizing also the correlations between data 
values, e.g., as is done in [29, 24, 17]. 
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APPENDIX 
A. PROOFS 

Lemma 1: 

PROOF. Based on the definition of the mechanism in Eq. (6), 
the residual of the noisy result with respect to the exact result, i.e. 

Q(D) - M P (Q, 73), is 73 • Lap ( A ^ L ) y.Jhe expected squared 

error is thus £ y B% 2 < A <f/» 2 . Since $(73, L) = £ y B%, the 
expected error of the mechanism is 2<f>(B, L)(A(B, L)) 2 /e 2 . □ 
Lemma 2: 

PROOF. Based on the definition of sensitivity, we have A(B' , L') 
= maxj J2i \L'ij\ = max., £\ \L i:j /a\ = a _1 A(73,L). 

The last equality holds because a is a positive constant. On the 
other hand, the scales of the decompositions follow a similar rela- 
tionship: 

Mb', l') = E( B «) 2 = E a2 ( B ^ 2 = a2 ^ B ' L ) 

ij ij 

Therefore, $(73', L')(A(B', L') 2 = $(73, L)(A(B, L)) 2 . Fi- 
nally, since B' L' = BL = W, we reach the conclusion of the 
lemma. □ 

Theorem 1: 

PROOF. Assume that (73*, L*) is the best matrix decomposition 
for minimizing the expected squared error for Mp(Q, 73). In the 
following, we prove that (B* , L*) is optimal, if and only if it also 
minimizes the program in Formula (7). 

(;/ part): If (B,L) minimizes Formula (7) but (73, L) incurs 
more expected error than (73*, L*), implying that 

$(73*,L*)(A(73*,L*)) 2 < $(73, L)(A(B, L)) 2 

By applying Lemma 2, we can construct another decomposi- 
tion 73' = A(B*,L*)B* and L' = A(B* , L*)^ 1 L* , such that 
$(73',L')(A(73',7/)) 2 < $(73, L)( A (73, L)) 2 . On the other hand, 
since A(73', L') < 1, we have max., ^ j |L^ | = 1. Therefore, we 
can derive the following inequalities. 

$(73',L') = $(73',L')(A(73',L')) 2 

< $(73,£)(A(73,L)) 2 

< $(-B,L) 

Finally, since $(73',77) = tr(73' T 73') and $(73, L) = tr(73 T 73), 
it leads to a contradiction if tr(73' T 73') < tr(73 T 73). 



{only if part): If (73*, L") is not the optimal solution to the pro- 
gram in Formula (7), the optimal solution (73, L) must incur less 
expected error, using a similar strategy. This completes the proof 
of the theorem. □ 

Lemma 3: 

PROOF. To prove the lemma, we aim to artificially construct a 
workload decomposition W = BL satisfying the constraints of the 
optimization formulation. If the error of this artificial decomposi- 
tion is no larger than the upper bound, the exact optimal solution 
must render results with less error. 

Recall that W has a unique SVD decomposition W = UT.V 
such that S is a diagonal matrix of size r x r. We thus build a 
decomposition 73 = y/rlfS and L = -j^V, in which r is the 
rank of the matrix W. First, we will show such (73, L) satisfies the 
constraints in Formula (7). It is straightforward to show it satisfies 
the first constraint: 73L = ^/rUY.-^V = UT.V = W. 

Regarding the second constraint, since V only contains orthogo- 
nal vectors, every column j must have \\V-j \\2 = IHI2 = 1. By the 
norm triangle inequality, 1 1 1? 1 1 2 < IMIi < \A" IMI2, and we obtain 
J2i \ < 1. Therefore, such (73, L) must be a valid solution 
to the program. 

The expected squared error of the artificial decomposition W = 
BL is at most 

tr(B T B)/e 2 = tr((^!7E) T (^[/E))/ e 2 
= tr(E T [/ T (7£))r/ e 2 

= E^ 2 

k=i 

This proves that 5^fc=i ^\ r /^ ' s an upper bound for the noise 
of our decomposition-based scheme. □ 

Lemma 4: 

PROOF. In Corollary 3.4 in [14], Hardt and Talwar proved that 
any e-differential privacy mechanism incurs expected squared error 
no less than 1 Q(r 3 (Vol(PW B?)) 2/r /e 2 ). 

In the formula above, 73? is the £i-unit ball. Vol{PWB?) is 
the volume of the unit ball after the linear transformation under 
PW, in which P is any orthogonal linear transformation matrix 
from K m M> E r . To prove the lemma, we construct an orthog- 
onal transformation P using the SVD decomposition over W = 
UTV. By simply letting P = U T , since U T U and VV T are iden- 
tity matrices, we have Vol(PWB^) = Vol{PUVV T T,VB 7 l) = 
Vol(V(V T ZV)B?) = Vol(VB?) nLi A fc . The last equality 
holds due to Lemma 7.5 in [14]. Consider the the convex body 
VBi. It is an r-dimensional unit ball after the orthogonal trans- 
formation under V. Note that Vol(B{) can be computed using the 
well known V function, as in [26], 2 r r [ff r) = 21. Therefore, 

the lower bound can be computed as: ^((77- Ilfc=i Afc) 2 ^ r r 3 /e 2 ). 
This reaches the conclusion of the lemma. □ 

Theorem 2: 

PROOF. To prove the theorem, we investigate the ratio of the 
upper bound to the lower bound. 



1 [14] used absolute error in the paper, which we change to squared 
error here. 
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< 



< 



Sfc=l A l 

(fni=iV) J/r r a 

rA? C 2 



C 



( 2 ir) 2/r Mr* ( 2 ir) 2/r r- V 4 

The last inequality holds due to the fact that r! < ( |J when 
r > 5. Note that all the inequalities above are tight, and the equal- 
ities hold when C = 1, i.e. Ai = A2 = . . - = A r . Thus, we 
prove that the approximation factor of our decomposition scheme 
is 0(C 2 r). □ 

Theorem 3: 

PROOF. When W / BL, the error has two parts. The first part 
is the noises due to the Laplace random variables. Using Lemma 1, 
the incurred erroris at most ^(B , L)(A(B, L)) 2 < ^ti{B T B). 

The second part of the error is the structural error on the results. 
The expected squared error is measured as 

((W - BL)D) T {W - BL)D 

n 

< \\W - BL\\%D T D = \\W - BL\\ 2 F J2 x i 

i=l 

The inequality is due to the Cauchy Schwartz inequality. By 
linearity of expectation, the expected squared errors can be simply 
summed up. This leads to the conclusion of the theorem. □ 

Theorem 4: 

PROOF. WeuseB (fc) to denote the optimal solution of the La- 
grangian sub-problem in k th iteration. Note the following inequal- 
ity on the sequence of the Lagrangian subproblems: 

J{B {k+1) \L {k+1) *M k) \p w ) 
(*)* 



= mmJ(B,L,Tr (k> ,/3 
< min J(B,L,Tt 

\\W-BL\\ F < 1 ,\/jY. i |i«l<l 



or Mk) 



min itr( B T B) = -tr( B* T B*) 

\\W-BL\\ F <-<,ViY, i \L tj \<l 2 2 

Based on the above inequality, we derive the following inequal- 
ity: 

1 



MB 



(fe+l) T B (k+1) 



= J{B^\L {k+ ^\^*,^) - {tv {k> ,W — B 



(fc+i) 



L (*+i)* !7r W*,/3 (fc) ) 



2/3(*=) 



{W-B {k+ ^L^)\\l 



!k (fe) lll) 



(fc + l)* r(fc+l)* (k)* Mk) 



7T W ,/3 W ) 



h (k) \\l) 



1 T 1 

- 2 v ; 2^< fc ) 



|_(fc+l)*||2 



2^( fe ) 

II (fc)*l|2 "\ 

- Ik IIf J 



71 \\F 



The third equality holds because of the Lagrangian multiplier 
update rule: 



w _ B (fc+i)* L ( fc+ i)' = ^(fc+D* _ n (k)^ 
Since always bounded, we conclude that 



-Xx ( B (k+1)T B [ ' 



This completes the proof of the theorem. □ 

B. IMPLEMENTATION OF THE MATRIX 
MECHANISM 

In [16], Li et al propose the Matrix Mechanism. The core of 
their method is finding a matrix A to minimize the following the 
program. 

min ||A||^tr(W /T WA t A tT ) (13) 

AeR rxn 

Li et. al. [16] present a complicated implementation that is 
rather impractical due to its prohibitively high complexity. We 
hereby present a simpler and more efficient solution to their op- 
timization program. Here ||j4.||| denotes the maximum C2 norm 
of column vectors of A, therefore = max(diag(A T A)). 

Since (A T A)^ 1 = (A T A)^ {A has full column rank), we let M = 
A T A, and reformulate Formula (13) as the following semidefinite 
programming problem: 

min max(diag(M))tr(W /T WM _1 ) s.t. M y 

M£S«" 

A is given by A — ^2™V~\iVivJ, where A;,Uj are the ith 
eigenvalue and eigenvector of M, respectively. Calculating the 
second term tc(W T WM~ 1 ) is relatively straightforward. Since 
it is smooth, its gradient can be computed as —M~ 1 W T W M~ x . 
However, calculating the first term max(diag(M)) is harder since 
it is non-smooth. Fortunately, inspired by [7], we can still use a 
logarithmic and exponential function to approximate this term. 

Approximate the maximum positive number: Since M is pos- 
itive definite, v = diag(M) > 0. we let /u > and define: 



U(v) = M log X] ( ex P (j; 



(14) 



We then have max(u) < / M (v) < max(u) + /ilogn. If we 
set /i = 7^7, this becomes a uniform e-approximation of max(u) 

with a Lipschitz continuous gradient with constant lo — i = . 
The gradient of the objective function with respect to 11 can be com- 
puted as: 



dl 

dvi 



exp [ 


( V 


i — max(u) 1 


\ 


V * ) 




fexp 1 


( Vj -max(u) 1 





V f ) 



(15) 



To mitigate the problems with large numbers, using the property 
of the logarithmic and exponential functions, we can rewrite Eq. 
(14) and Eq. (15) as: 



U(v) = max(«) + /i log I ^2 exp 



Vi — max(-u) 



Vj — Vi 



This formulation allows us to run the non-monotone projected 
gradient descent algorithm [2] and iteratively improves the result. 
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